The real-world data tends to be heavily imbalanced and severely skew the data-driven deep neural networks, which makes Long-Tailed Recognition (LTR) a massive challenging task. Existing LTR methods seldom train Vision Transformers (ViTs) with Long-Tailed (LT) data, while the off-the-shelf pretrain weight of ViTs always leads to unfair comparisons. In this paper, we systematically investigate the ViTs' performance in LTR and propose LiVT to train ViTs from scratch only with LT data. With the observation that ViTs suffer more severe LTR problems, we conduct Masked Generative Pretraining (MGP) to learn generalized features. With ample and solid evidence, we show that MGP is more robust than supervised manners. In addition, Binary Cross Entropy (BCE) loss, which shows conspicuous performance with ViTs, encounters predicaments in LTR. We further propose the balanced BCE to ameliorate it with strong theoretical groundings. Specially, we derive the unbiased extension of Sigmoid and compensate extra logit margins to deploy it. Our Bal-BCE contributes to the quick convergence of ViTs in just a few epochs. Extensive experiments demonstrate that with MGP and Bal-BCE, LiVT successfully trains ViTs well without any additional data and outperforms comparable state-of-the-art methods significantly, e.g., our ViT-B achieves 81.0% Top-1 accuracy in iNaturalist 2018 without bells and whistles. Code is available at https://github.com/XuZhengzhuo/LiVT.
translated by 谷歌翻译
准确的车辆类型分类在智能运输系统中起重要作用。对于统治者而言,重要的是要了解道路状况,通常为交通灯控制系统的贡献,以相应地响应以减轻交通拥堵。新技术和全面数据源,例如航空照片和遥感数据,提供了更丰富,高维的信息。同样,由于深度神经网络技术的快速发展,基于图像的车辆分类方法可以在处理数据时更好地提取基本的客观特征。最近,已经提出了几种深度学习模型来解决该问题。但是,基于纯卷积的传统方法对全球信息提取有限制,而复杂的环境(例如恶劣的天气)严重限制了识别能力。为了在复杂环境下提高车辆类型的分类能力,本研究提出了一种新型连接的卷积变压器在变压器神经网络(密度TNT)框架中,通过堆叠密集连接的卷积网络(Densenet)和变压器(TNT)(TNT)(TNT)(TNT )层。部署了三个区域的数据和四个不同的天气条件以评估识别能力。实验发现,即使在严重的雾气天气条件下,我们提出的车辆分类模型的识别能力也很少。
translated by 谷歌翻译
电动汽车(EV)充电需求和充电站的可用性预测是智能运输系统中的挑战之一。通过准确的EV站情况预测,可以提前安排合适的充电行为以缓解范围焦虑。但是,由于复杂的道路网络结构和全面的外部因素,例如兴趣点(POI)和天气效应,许多现有的深度学习方法用于解决此问题,因此,许多常用算法只能在没有历史用法的情况下提取历史用法考虑外部因素的全面影响。为了提高预测准确性和可解释性,在本研究中提出了属性增强的时空图信息器(AST-GIN)结构,通过将图形卷积网络(GCN)层和告密者层组合来提取外部和内部空间 - 相关运输数据的时间依赖性。并且外部因素被模拟为动态属性,由属性调制的编码器进行训练。测试了邓迪市收集的数据的AST-gin模型,实验结果表明,与其他基线相比,考虑到外部因素对各种地平线环境的影响,我们的模型的有效性。
translated by 谷歌翻译
我们通过用基于字符的模型对源文本的最小删除来检查英语 - 英语和中文 - 英语内神经机器翻译中罕见但严重的错误的诱导。通过删除单个字符,我们发现我们可以在翻译中引起严重的错误。我们对这些错误进行分类,并比较删除单个字符和单词的结果。我们还研究了训练数据大小对这些最小扰动引起的病理病例的数量和类型的影响,从而发现了显着差异。
translated by 谷歌翻译
调节软执行器刚度的能力在提高与环境相互作用的效率方面起着至关重要的作用。但是,对于单向刚度调制机制,不能同时保证高侧向刚度和宽范围的弯曲刚度。因此,我们从手指的解剖结构中汲取灵感,提出具有双向可调刚度特性(BTSA)的软执行器。 BTSA由空气式杂种致动(ATA)和骨状结构(BLS)组成。 ATA可以将弯曲刚度从0.2 n/mm调整为0.7 n/mm,约为3.5倍。与无BLS相比,BLS的侧向刚度可增强4.2倍。同时,可以将侧向刚度调节在一定刚度范围内(例如,当弯曲角度为45度时从0.35 N/mm到0.46)。 BLS是根据简化的刚度分析模型设计的。并提出了一种基于蜡的制造方法,以确保气密性。进行有关指尖力,弯曲刚度和侧向刚度的实验以验证特性。
translated by 谷歌翻译
频谱重建的现有方法通常学习从RGB图像到多个频带的离散映射。然而,这种建模策略忽略了光谱签名的连续性。在本文中,我们提出了神经光谱重构(NESR)来提升这种限制,通过引入新的连续光谱表示来提升这种限制。为此,我们拥抱隐式功能的概念,并利用神经网络实现参数化实施例。具体来说,我们首先采用骨干网络来提取RGB输入的空间特征。基于它,我们设计了光谱简档插值(SPI)模块和神经注意映射(NAM)模块,以丰富深度特征,其中空间谱相关涉及更好的表示。然后,我们将采样光谱频带的数量视为连续隐式功能的坐标,以便从深度特征到频谱强度来学习投影。广泛的实验表明NESR在基线方法中重建精度的明显优势。此外,NESR通过使任意数量的频谱频带作为目标输出来扩展光谱重建的灵活性。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译